Efficient Exact Pattern-Matching in Proteomic Sequences
نویسندگان
چکیده
This paper proposes a novel algorithm for complete exact patternmatching focusing the specificities of protein sequences (alphabet of 20 symbols) but, also highly efficient considering larger alphabets. The searching strategy uses large search windows allowing multiple alignments per iteration. A new filtering heuristic, named compatibility rule, contributed decisively to the efficiency improvement. The new algorithm’s performance is, on average, superior in comparison with its best-rated competitors.
منابع مشابه
Inexact Pattern Matching Algorithms via Automata
Pattern matching occurs in various applications, ranging from simple text searching in word processors to identification of common motifs in DNA sequences in computational biology. The problem of exact pattern matching has been well studied and a number of efficient algorithms exist. However these exact pattern matching algorithms are of little help when they are applied to finding patterns in ...
متن کاملAn Index based Pattern Matching using Multithreading
Pattern matching, the problem of finding sub sequences within a long sequence is essential for many applications such as information retrieval, disease analysis, structural and functional analysis, logic programming, theorem-proving, term rewriting and DNA-computing. In computational biology the essential components for DNA applications is the exact string matching algorithms. Many databases li...
متن کاملA High Performance Distributed Tool for Mining Patterns in Biological Sequences
The identification of interesting patterns (or subsequences) in biosequences has an important role in computational biology. Databases of genomic and proteomic sequences have grown exponentially, and therefore pattern discovery is a hard problem requiring clever strategies and powerful pattern languages to achieve manageable levels of efficiency. As far as we are aware of, known tools are eithe...
متن کاملEvaluation and Improvement of Fast Algorithms for Exact Matching on Genome Sequences
With the availability of large amounts of dna data, exact matching of nucleotide sequences has become an important application in modern computational biology and in meta-genomics. In the last decade several efficient solutions for the exact string matching problem have been developed and most of them are very fast in practical cases. However when the length of the pattern is short or the alpha...
متن کاملComparison of Exact String Matching Algorithms for Biological Sequences
Exact matching of single patterns in DNA and amino acid sequences is studied. We performed an extensive experimental comparison of algorithms presented in the literature. In addition, we introduce new variations of earlier algorithms. The results of the comparison show that the new algorithms are efficient in practice.
متن کامل